by Ha Phuong Thao (Joanna)
This project is a part of Data Analyst Nanodegree program.
In this exercise, I will explore a financial data set “Prosper Loans”. Prosper is an American peer-to-peer lending company that offers personal loans at low rates. These loans are unsecured, which mean you do not have any put up and collateral (like a house or a car) that could get taken away if you can’t make payments. Each loan is typically funded by multiple people over the United States. In this way, Prosper is a marketplace connecting those who need a loan to those who have extra money to lend. Throughout this dataset, I will explore the patterns between 81 variables and 113937 observations on each loan of Prosper.
Firstly, I am running some basic functions to examine the structure and schema of the data set
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
## $ CreditGrade : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CurrentlyInGroup : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
## $ GroupKey : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
## $ DateCreditPulled : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ MemberKey : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
## ListingKey ListingNumber
## 17A93590655669644DB4C06: 6 Min. : 4
## 349D3587495831350F0F648: 4 1st Qu.: 400919
## 47C1359638497431975670B: 4 Median : 600554
## 8474358854651984137201C: 4 Mean : 627886
## DE8535960513435199406CE: 4 3rd Qu.: 892634
## 04C13599434217079754AEE: 3 Max. :1255725
## (Other) :113912
## ListingCreationDate CreditGrade Term
## 2013-10-02 17:20:16.550000000: 6 :84984 Min. :12.00
## 2013-08-28 20:31:41.107000000: 4 C : 5649 1st Qu.:36.00
## 2013-09-08 09:27:44.853000000: 4 D : 5153 Median :36.00
## 2013-12-06 05:43:13.830000000: 4 B : 4389 Mean :40.83
## 2013-12-06 11:44:58.283000000: 4 AA : 3509 3rd Qu.:36.00
## 2013-08-21 07:25:22.360000000: 3 HR : 3508 Max. :60.00
## (Other) :113912 (Other): 6745
## LoanStatus ClosedDate
## Current :56576 :58848
## Completed :38074 2014-03-04 00:00:00: 105
## Chargedoff :11992 2014-02-19 00:00:00: 100
## Defaulted : 5018 2014-02-11 00:00:00: 92
## Past Due (1-15 days) : 806 2012-10-30 00:00:00: 81
## Past Due (31-60 days): 363 2013-02-26 00:00:00: 78
## (Other) : 1108 (Other) :54633
## BorrowerAPR BorrowerRate LenderYield
## Min. :0.00653 Min. :0.0000 Min. :-0.0100
## 1st Qu.:0.15629 1st Qu.:0.1340 1st Qu.: 0.1242
## Median :0.20976 Median :0.1840 Median : 0.1730
## Mean :0.21883 Mean :0.1928 Mean : 0.1827
## 3rd Qu.:0.28381 3rd Qu.:0.2500 3rd Qu.: 0.2400
## Max. :0.51229 Max. :0.4975 Max. : 0.4925
## NA's :25
## EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## Min. :-0.183 Min. :0.005 Min. :-0.183
## 1st Qu.: 0.116 1st Qu.:0.042 1st Qu.: 0.074
## Median : 0.162 Median :0.072 Median : 0.092
## Mean : 0.169 Mean :0.080 Mean : 0.096
## 3rd Qu.: 0.224 3rd Qu.:0.112 3rd Qu.: 0.117
## Max. : 0.320 Max. :0.366 Max. : 0.284
## NA's :29084 NA's :29084 NA's :29084
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## Min. :1.000 :29084 Min. : 1.00
## 1st Qu.:3.000 C :18345 1st Qu.: 4.00
## Median :4.000 B :15581 Median : 6.00
## Mean :4.072 A :14551 Mean : 5.95
## 3rd Qu.:5.000 D :14274 3rd Qu.: 8.00
## Max. :7.000 E : 9795 Max. :11.00
## NA's :29084 (Other):12307 NA's :29084
## ListingCategory..numeric. BorrowerState
## Min. : 0.000 CA :14717
## 1st Qu.: 1.000 TX : 6842
## Median : 1.000 NY : 6729
## Mean : 2.774 FL : 6720
## 3rd Qu.: 3.000 IL : 5921
## Max. :20.000 : 5515
## (Other):67493
## Occupation EmploymentStatus
## Other :28617 Employed :67322
## Professional :13628 Full-time :26355
## Computer Programmer : 4478 Self-employed: 6134
## Executive : 4311 Not available: 5347
## Teacher : 3759 Other : 3806
## Administrative Assistant: 3688 : 2255
## (Other) :55456 (Other) : 2718
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## Min. : 0.00 False:56459 False:101218
## 1st Qu.: 26.00 True :57478 True : 12719
## Median : 67.00
## Mean : 96.07
## 3rd Qu.:137.00
## Max. :755.00
## NA's :7625
## GroupKey DateCreditPulled
## :100596 2013-12-23 09:38:12: 6
## 783C3371218786870A73D20: 1140 2013-11-21 09:09:41: 4
## 3D4D3366260257624AB272D: 916 2013-12-06 05:43:16: 4
## 6A3B336601725506917317E: 698 2014-01-14 20:17:49: 4
## FEF83377364176536637E50: 611 2014-02-09 12:14:41: 4
## C9643379247860156A00EC0: 342 2013-09-27 22:04:54: 3
## (Other) : 9634 (Other) :113912
## CreditScoreRangeLower CreditScoreRangeUpper
## Min. : 0.0 Min. : 19.0
## 1st Qu.:660.0 1st Qu.:679.0
## Median :680.0 Median :699.0
## Mean :685.6 Mean :704.6
## 3rd Qu.:720.0 3rd Qu.:739.0
## Max. :880.0 Max. :899.0
## NA's :591 NA's :591
## FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
## : 697 Min. : 0.00 Min. : 0.00
## 1993-12-01 00:00:00: 185 1st Qu.: 7.00 1st Qu.: 6.00
## 1994-11-01 00:00:00: 178 Median :10.00 Median : 9.00
## 1995-11-01 00:00:00: 168 Mean :10.32 Mean : 9.26
## 1990-04-01 00:00:00: 161 3rd Qu.:13.00 3rd Qu.:12.00
## 1995-03-01 00:00:00: 159 Max. :59.00 Max. :54.00
## (Other) :112389 NA's :7604 NA's :7604
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 2.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.75 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
## NA's :697
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## Min. : 0.0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 114.0 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 271.0 Median : 1.000 Median : 4.000
## Mean : 398.3 Mean : 1.435 Mean : 5.584
## 3rd Qu.: 525.0 3rd Qu.: 2.000 3rd Qu.: 7.000
## Max. :14985.0 Max. :105.000 Max. :379.000
## NA's :697 NA's :1159
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0 Median : 0.000
## Mean : 0.5921 Mean : 984.5 Mean : 4.155
## 3rd Qu.: 0.0000 3rd Qu.: 0.0 3rd Qu.: 3.000
## Max. :83.0000 Max. :463881.0 Max. :99.000
## NA's :697 NA's :7622 NA's :990
## PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
## Min. : 0.0000 Min. : 0.000 Min. : 0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 3121
## Median : 0.0000 Median : 0.000 Median : 8549
## Mean : 0.3126 Mean : 0.015 Mean : 17599
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 19521
## Max. :38.0000 Max. :20.000 Max. :1435667
## NA's :697 NA's :7604 NA's :7604
## BankcardUtilization AvailableBankcardCredit TotalTrades
## Min. :0.000 Min. : 0 Min. : 0.00
## 1st Qu.:0.310 1st Qu.: 880 1st Qu.: 15.00
## Median :0.600 Median : 4100 Median : 22.00
## Mean :0.561 Mean : 11210 Mean : 23.23
## 3rd Qu.:0.840 3rd Qu.: 13180 3rd Qu.: 30.00
## Max. :5.950 Max. :646285 Max. :126.00
## NA's :7604 NA's :7544 NA's :7544
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## Min. :0.000 Min. : 0.000
## 1st Qu.:0.820 1st Qu.: 0.000
## Median :0.940 Median : 0.000
## Mean :0.886 Mean : 0.802
## 3rd Qu.:1.000 3rd Qu.: 1.000
## Max. :1.000 Max. :20.000
## NA's :7544 NA's :7544
## DebtToIncomeRatio IncomeRange IncomeVerifiable
## Min. : 0.000 $25,000-49,999:32192 False: 8669
## 1st Qu.: 0.140 $50,000-74,999:31050 True :105268
## Median : 0.220 $100,000+ :17337
## Mean : 0.276 $75,000-99,999:16916
## 3rd Qu.: 0.320 Not displayed : 7741
## Max. :10.010 $1-24,999 : 7274
## NA's :8554 (Other) : 1427
## StatedMonthlyIncome LoanKey TotalProsperLoans
## Min. : 0 CB1B37030986463208432A1: 6 Min. :0.00
## 1st Qu.: 3200 2DEE3698211017519D7333F: 4 1st Qu.:1.00
## Median : 4667 9F4B37043517554537C364C: 4 Median :1.00
## Mean : 5608 D895370150591392337ED6D: 4 Mean :1.42
## 3rd Qu.: 6825 E6FB37073953690388BC56D: 4 3rd Qu.:2.00
## Max. :1750003 0D8F37036734373301ED419: 3 Max. :8.00
## (Other) :113912 NA's :91852
## TotalProsperPaymentsBilled OnTimeProsperPayments
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 9.00
## Median : 16.00 Median : 15.00
## Mean : 22.93 Mean : 22.27
## 3rd Qu.: 33.00 3rd Qu.: 32.00
## Max. :141.00 Max. :141.00
## NA's :91852 NA's :91852
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 0.61 Mean : 0.05
## 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :42.00 Max. :21.00
## NA's :91852 NA's :91852
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## Min. : 0 Min. : 0
## 1st Qu.: 3500 1st Qu.: 0
## Median : 6000 Median : 1627
## Mean : 8472 Mean : 2930
## 3rd Qu.:11000 3rd Qu.: 4127
## Max. :72499 Max. :23451
## NA's :91852 NA's :91852
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## Min. :-209.00 Min. : 0.0
## 1st Qu.: -35.00 1st Qu.: 0.0
## Median : -3.00 Median : 0.0
## Mean : -3.22 Mean : 152.8
## 3rd Qu.: 25.00 3rd Qu.: 0.0
## Max. : 286.00 Max. :2704.0
## NA's :95009
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 9.00 1st Qu.: 6.0 1st Qu.: 37332
## Median :14.00 Median : 21.0 Median : 68599
## Mean :16.27 Mean : 31.9 Mean : 69444
## 3rd Qu.:22.00 3rd Qu.: 65.0 3rd Qu.:101901
## Max. :44.00 Max. :100.0 Max. :136486
## NA's :96985
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## Min. : 1000 2014-01-22 00:00:00: 491 Q4 2013:14450
## 1st Qu.: 4000 2013-11-13 00:00:00: 490 Q1 2014:12172
## Median : 6500 2014-02-19 00:00:00: 439 Q3 2013: 9180
## Mean : 8337 2013-10-16 00:00:00: 434 Q2 2013: 7099
## 3rd Qu.:12000 2014-01-28 00:00:00: 339 Q3 2012: 5632
## Max. :35000 2013-09-24 00:00:00: 316 Q2 2012: 5061
## (Other) :111428 (Other):60343
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 63CA34120866140639431C9: 9 Min. : 0.0 Min. : -2.35
## 16083364744933457E57FB9: 8 1st Qu.: 131.6 1st Qu.: 1005.76
## 3A2F3380477699707C81385: 8 Median : 217.7 Median : 2583.83
## 4D9C3403302047712AD0CDD: 8 Mean : 272.5 Mean : 4183.08
## 739C338135235294782AE75: 8 3rd Qu.: 371.6 3rd Qu.: 5548.40
## 7E1733653050264822FAA3D: 8 Max. :2251.5 Max. :40702.39
## (Other) :113888
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## Min. : 0.0 Min. : -2.35 Min. :-664.87
## 1st Qu.: 500.9 1st Qu.: 274.87 1st Qu.: -73.18
## Median : 1587.5 Median : 700.84 Median : -34.44
## Mean : 3105.5 Mean : 1077.54 Mean : -54.73
## 3rd Qu.: 4000.0 3rd Qu.: 1458.54 3rd Qu.: -13.92
## Max. :35000.0 Max. :15617.03 Max. : 32.06
##
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## Min. :-9274.75 Min. : -94.2 Min. : -954.5
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.00 Median : 0.0 Median : 0.0
## Mean : -14.24 Mean : 700.4 Mean : 681.4
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. : 0.00 Max. :25000.0 Max. :25000.0
##
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## Min. : 0.00 Min. :0.7000 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.: 0.00000
## Median : 0.00 Median :1.0000 Median : 0.00000
## Mean : 25.14 Mean :0.9986 Mean : 0.04803
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.: 0.00000
## Max. :21117.90 Max. :1.0125 Max. :39.00000
##
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## Min. : 0.00000 Min. : 0.00 Min. : 1.00
## 1st Qu.: 0.00000 1st Qu.: 0.00 1st Qu.: 2.00
## Median : 0.00000 Median : 0.00 Median : 44.00
## Mean : 0.02346 Mean : 16.55 Mean : 80.48
## 3rd Qu.: 0.00000 3rd Qu.: 0.00 3rd Qu.: 115.00
## Max. :33.00000 Max. :25000.00 Max. :1189.00
##
## [1] "ListingKey"
## [2] "ListingNumber"
## [3] "ListingCreationDate"
## [4] "CreditGrade"
## [5] "Term"
## [6] "LoanStatus"
## [7] "ClosedDate"
## [8] "BorrowerAPR"
## [9] "BorrowerRate"
## [10] "LenderYield"
## [11] "EstimatedEffectiveYield"
## [12] "EstimatedLoss"
## [13] "EstimatedReturn"
## [14] "ProsperRating..numeric."
## [15] "ProsperRating..Alpha."
## [16] "ProsperScore"
## [17] "ListingCategory..numeric."
## [18] "BorrowerState"
## [19] "Occupation"
## [20] "EmploymentStatus"
## [21] "EmploymentStatusDuration"
## [22] "IsBorrowerHomeowner"
## [23] "CurrentlyInGroup"
## [24] "GroupKey"
## [25] "DateCreditPulled"
## [26] "CreditScoreRangeLower"
## [27] "CreditScoreRangeUpper"
## [28] "FirstRecordedCreditLine"
## [29] "CurrentCreditLines"
## [30] "OpenCreditLines"
## [31] "TotalCreditLinespast7years"
## [32] "OpenRevolvingAccounts"
## [33] "OpenRevolvingMonthlyPayment"
## [34] "InquiriesLast6Months"
## [35] "TotalInquiries"
## [36] "CurrentDelinquencies"
## [37] "AmountDelinquent"
## [38] "DelinquenciesLast7Years"
## [39] "PublicRecordsLast10Years"
## [40] "PublicRecordsLast12Months"
## [41] "RevolvingCreditBalance"
## [42] "BankcardUtilization"
## [43] "AvailableBankcardCredit"
## [44] "TotalTrades"
## [45] "TradesNeverDelinquent..percentage."
## [46] "TradesOpenedLast6Months"
## [47] "DebtToIncomeRatio"
## [48] "IncomeRange"
## [49] "IncomeVerifiable"
## [50] "StatedMonthlyIncome"
## [51] "LoanKey"
## [52] "TotalProsperLoans"
## [53] "TotalProsperPaymentsBilled"
## [54] "OnTimeProsperPayments"
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"
## [57] "ProsperPrincipalBorrowed"
## [58] "ProsperPrincipalOutstanding"
## [59] "ScorexChangeAtTimeOfListing"
## [60] "LoanCurrentDaysDelinquent"
## [61] "LoanFirstDefaultedCycleNumber"
## [62] "LoanMonthsSinceOrigination"
## [63] "LoanNumber"
## [64] "LoanOriginalAmount"
## [65] "LoanOriginationDate"
## [66] "LoanOriginationQuarter"
## [67] "MemberKey"
## [68] "MonthlyLoanPayment"
## [69] "LP_CustomerPayments"
## [70] "LP_CustomerPrincipalPayments"
## [71] "LP_InterestandFees"
## [72] "LP_ServiceFees"
## [73] "LP_CollectionFees"
## [74] "LP_GrossPrincipalLoss"
## [75] "LP_NetPrincipalLoss"
## [76] "LP_NonPrincipalRecoverypayments"
## [77] "PercentFunded"
## [78] "Recommendations"
## [79] "InvestmentFromFriendsCount"
## [80] "InvestmentFromFriendsAmount"
## [81] "Investors"
Prosper lending drop significantly in 2009 due to their relaunch that Prosper became a lending platform focus on prime and nearly prime borrowers. Then, they started to scale again and grew incredibly in 2013 proving that online lending is a potential investment channel for both investors as well as borrowers.
The P2P lending platform is geographically restricted, not all states are opened to Prosper loan. States such as California, Florida, Georgia, Illinois, New York, Ohio and Texas account for the largest share of loans.
Most borrowers joined Prosper loan for the reasons including debt consolidation, home improvement, business, personal loan, auto and other. Meanwhile, the most popular purpose of Prosper loan is to pay off credit card debt, as debt consolidation. This figure exceeds all of the others, reaching almost 60,000 of loans. This trend is usual since Prosper loan could help borrowers saving a lot of money from other loans, and most importantly customer’s credit score could increase shortly after doing the consolidation, especially when your credit utilization ratio is getting hurt itself.
Let’s now take a look at the distribution of each metric on loans made over the past several years.
Prosper Rating is the letter (AA - HR) that assigned to a borrower. This is a proprietary system that similar to a credit score that it is predictive of the likelihood of loan default. Prosper uses this rating in setting the pricing on your loan.
Prosper Score is a customer risk score was build using historical Prosper data to access the risk of Prosper borrower listing. Ranging from 1 to 11, with 11 being the best (or lower risk) score, the worst (or higher risk) score is a 1.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
Credit Score, Prosper Rating and Prosper Score is the three numbers that indicate the health of credit of a loan. As you might find through the plot, the bulk of borrowers lie among type A, B, C and D of Prosper Rating and between a range of Prosper Score from 4 to 8. Meanwhile, the majority of loans is those with rate C and score 5. Furthermore, the lower bound credit score of those who acquired most of the installment loan is between 620 and 700. Even though Prosper personal loan requires borrowers to have a minimum credit score of 640 in order to qualify for a loan, there are still some individuals having a credit score which is less than 640 sitting on the list.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
The mean interest rate for all Prosper loans is fairly substantial at 19.28%. Interestingly, the volume is considerably high at the rate of 32%, proving that many investors are interested in higher risk investment which corresponds to the higher interest rate in return.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
Lender yield is equal to the interest rate on the loan less the servicing fee paid by the borrowers. This is also one of the most important inputs into any return calculation.
Borrower’s Financial Information
Almost a half of loan count are those with income ranging from 25,000 to 75,000 USD. Loans offered through Prosper only range from 2,000 to 35,000 USD so this number would make sense since people who earn a higher wage at almost 80,000 to 100,000 are less likely to loan money than those earning a lower income. On the other hand, these people might be young adult beginning to start their career at the junior level.
Again we see, people who have a monthly income ranging from 2000 to 8000 USD are those who much more in need of a personal loan.
A vast majority of borrowers is those who employed or has a full-time job. This number demonstrates that they are eligible to pay off the loan and well meet the requirement to issue one. However, those who retired and not employed borrow more money than those having a part-time job. This could be explained by the fact that part-time workers are more likely to be students, while retired and not-employed people are professionals and they have more years of experience in earning money, those who get more responsibility in life and are more likely to be in need of a loan.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 2.167 5.583 8.006 11.417 62.917 7625
The data is positively skewed and long-tailed. Employed duration has a median of 5.5 years and a mean of 8 years which is indicating that young professionals are the main customer of Prosper personal loan.
Debt-to-income ratio smaller than 36% is preferable. Most of Prosper loans had DTI ratio approximately around 40% to 70%. The lower the number, the better the chance that an individual will be able to get loans.
People lent a loan amount between 1,000 and 10,000 USD most of the time. They tend to request for a loan in a regular number such as 10,000, 15,000, 20,000, 25,000 USD and they are less likely to request for a number in between.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
After removing 0.01 percent of the outlier we have a clearer picture of this figure. On average, the median and mean of the amount of money one has to pay each month are 217 USD and 277 USD respectively.
The higher the score of credit card utilization, the more people opt for a personal loan. The general rule of thumb is to keep your balance at or below 33% of your total available credit to secure your credit score. Borrowers using 66% or more of their available credit might also bring a higher risk on investment.
Borrowers can repeat on their loan of Prosper and many of them have 1 loan previously.
Prosper loan offers three specific loan term of 1, 3 and 5 years. It’s fascinating to figure out that more than 50% of total borrowers signing up for the 3-year term loan. Meanwhile, 1-year term loan seems to be much less attractive comparing to the others.
As we can see, there is a remarkable change in the number of investors in the year 2013 and 2014. Distinct investor, who actually invest in the whole loan, increase considerably. In comparison with a loan funded by multiple investors, a single funded loan made up quite a larger amount in the graph. This trend is respected to grow strongly in the near future.
The original dataset contained 113,937 loan records with 81 variables to examine. Key variables based on my analysis are divided into three terms: borrower’s credit information (Credit Grade, Prosper Rating, Prosper Score), borrower’s financial health (Debt to Income Ratio, Bankcard Utilization, Current credit line) and estimated investment return (Lender Yield, Borrower Rate, number of investors)
Other observation: * Nearly 25% of investors funded the whole loan for the given period and this trend grew significantly to 50% of investors in 2013 and 2014. * Most loans are issued to pay off debt consolidation. * The median borrower rate is 18% and the maximum rate is 49.75% * 50% of Prosper loan borrowers have less than 6 years of working experience. * The median Debt to Income ratio is 22%
Firstly, I’m interested in Prosper Rating and Prosper Score, which are the indicators of borrower’s credit history and the tool predicting loan price as well as the likelihood of default. From another point of view, Borrower Rate and/or Lender Yield are especially in most concern for lenders, since these figures are capable of anticipating returned profits to some degree. Digging deeper into the dataset, I would like to switch myself to the perspective of an investor trying to examine the relationship between borrower profile along with estimated return/loss and default chance on each loan.
Up to now, I thought Debt to Income ratio, Employment duration and Monthly Payment on loan might have some meaningful relationships with Prosper Rating and Score. Loan term is another factor needed to take into account and I thought it relates closely to loan purpose and credit health, which I will explore later on. The number of loans funded by single investors is an intriguing figure since this trend increased incredibly in the last two years. Analyzing credit grade, lender yield, borrower rate, number of delinquency could help comprehend this trend much deeper as a point of view of an investor.
I created a new variable for loan origination in month and year, and factorize variable of loan status.
Yes, there are. Borrower rate at 32% is exceptionally higher in volume than it’s nearest rates such as 30% and 34%. The same observation that found in lender yield distribution. Additionally, borrowers are likely to stick with 3-year term loan most of the time, I thought we could investigate more on this observation to figure out why it’s the trend. I did transform some variables into sequential categories since this will help the code run more quickly and give us more of a better view of the distribution of those variables. I also transformed DateTime variables appropriately to extract exact month and year of each transaction.
##
## Pearson's product-moment correlation
##
## data: ProsperRating..numeric. and LenderYield
## t = -917.52, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9537315 -0.9524993
## sample estimates:
## cor
## -0.9531194
##
## Pearson's product-moment correlation
##
## data: ProsperRating..numeric. and EstimatedLoss
## t = -1058.9, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9646522 -0.9637054
## sample estimates:
## cor
## -0.9641819
In those analyses, either the interest rates and losses are grouped within the Prosper Rating. As we can see, the lower the Prosper rating, the higher the amount of yield could be earned, and also the higher the money lenders could lose. Therefore, simply looking at Prosper Rating we could firstly estimate the return and the risk lenders take. I would say that loans with higher returns also have a higher likelihood of default either.
## prosper_loan$status: Cancelled
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0975 0.1345 0.1950 0.1784 0.2325 0.2325
## --------------------------------------------------------
## prosper_loan$status: Current or Paid
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1174 0.1660 0.1749 0.2299 0.4925
## --------------------------------------------------------
## prosper_loan$status: Defaulted
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1605 0.2250 0.2210 0.2820 0.4800
## --------------------------------------------------------
## prosper_loan$status: Past Due
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0499 0.1785 0.2299 0.2244 0.2816 0.3335
As I expected, defaulted loans have a higher median of Lender Yield than Cancelled/ Current and Paid loans. The median lender yield of the defaulted notes was 22,5%, just a tiny bit lower than Past due loans.
##
## Pearson's product-moment correlation
##
## data: OpenRevolvingMonthlyPayment and EmploymentStatusDuration
## t = 59.47, df = 106310, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1736111 0.1852463
## sample estimates:
## cor
## 0.179435
It is important to notice that those who stay longer in the workforce tend to acquire more debt and pay for it with much higher amount than young professionals.
## prosper_loan$Term: 12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0300 0.0829 0.1334 0.1401 0.1964 0.2569
## --------------------------------------------------------
## prosper_loan$Term: 36
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1174 0.1700 0.1834 0.2499 0.4925
## --------------------------------------------------------
## prosper_loan$Term: 60
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0569 0.1390 0.1770 0.1830 0.2219 0.3204
Looking at this we can partly explain why much more investors opt for 3-month loans. The median of 3-year term rate is 3.67% higher than 1-year term rate, and the 5-year term has an exact same rate as a 3-year term. Obviously, a 3-year term is the most appropriate investment due to its returns and the amount of time we put money in.
## prosper_loan$LoanAmount.bucket: (0,5e+03]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1500 0.2284 0.2193 0.2900 0.4975
## --------------------------------------------------------
## prosper_loan$LoanAmount.bucket: (5e+03,1.5e+04]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0100 0.1299 0.1710 0.1767 0.2195 0.3600
## --------------------------------------------------------
## prosper_loan$LoanAmount.bucket: (1.5e+04,2.5e+04]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1101 0.1400 0.1430 0.1725 0.3575
## --------------------------------------------------------
## prosper_loan$LoanAmount.bucket: (2.5e+04,3.5e+04]
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0759 0.1153 0.1302 0.1293 0.1435 0.1819
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3341283 -0.3237719
## sample estimates:
## cor
## -0.3289599
From this picture, it appears that smaller loans yield much higher interest rate than the larger ones even though their correlation is not that strong. So far I didn’t know why but this figure is quite intriguing to put more investigation.
Almost all HR graders are filled with <5000 USD loan. The highest amount of loan allowed to issued (25,000-35,000 USD) only appear in grade A and grade B, loans in this range are reliable enough and yield more return than grade AA by all odds. No wonder why smaller loans correlate to higher borrower rate.
As we expected, 3-year term loan has a higher amount of default case compare to a 5-year term loan. Additionally, 3-year and 5-year term are more attractive than 1-year term due to their higher figure of lender yield.
Speaking of default possibility, We could see that the default number is typically high in the loan graded D, E, HR since they are considered as having more risk. Surprisingly, the highest amount of loan found defaulted is not from the highest risk HR but from those in grade D.
If you own a home, the home may very well your big-ticket asset, and the mortgage of that home might be your largest debt. If you rent a home, the monthly renting payment you pay is also a big deal of expense. As we can see, there is a small gap of difference between the number of people who own a house and those who don’t in 5-year loan term. That maybe are borrowers having a mortgage to pay monthly hence they are more likely to opt for longer duration p2p loans since a mortgage payment is considered a large debt.
Surprisingly, borrowers that do not own houses having more chance of default. Homeownership may be considered to be an indicator of financial responsibility and low credit risk. After doing some research, I found out that those who had been approved for a mortgage by a bank or really owning a house must have successfully demonstrated to have financial stability.
The majority loan has less than 15 accounts opened at a time. However, there is also a long tail on this distribution. The default rates are much higher for those with 5 or fewer account. Base on this analysis, we should consider very carefully to lend a borrower with less than 5 open accounts.
So far, I haven’t figured out any significant correlations of those factors above, except Prosper rating and Lender Yield/Estimated Loss. However, I found some interesting patterns between Lender Yield/ Borrower Rate with Term, Loan status with Loan Original Amount. Employment Status Duration is also a good figure to observe when it associates with Debt to Income ratio and Monthly Payment.
There are some stimulating patterns are found but there is no particular relationship that I saw in my exploration of other features.
Prosper Rating is negatively strongly correlated with Lender Yield and Estimated Loss.
## prosper_loan$ProsperRating..Alpha.: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 62.75 150.00 178.37 274.00 1035.00
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 51.00 97.53 162.00 1189.00
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 19.0 69.8 112.0 856.0
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 9.00 51.09 78.00 1024.00
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 4.00 35.00 56.23 87.00 511.00
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 25.00 35.01 55.00 279.00
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 9.00 31.00 35.28 54.00 237.00
## --------------------------------------------------------
## prosper_loan$ProsperRating..Alpha.:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 34.0 78.0 116.1 158.0 913.0
It appears that larger investors who look to fund the whole loan are more attracted to A-B-C range of Prosper rating. In the other hand, AA profiles appear to be a sweet spot for multiple investors seeking lower risk, lower yielding loan. While a large proportion of loans might be funded wholely today, this was not always the case. Before 2013, more than 90 percent of loans were funded by multiple investors. This dramatic shift is certainly due to the increasing interest in online lending and their desire to accumulate substantial portfolios.
This plot aims to indicate the relationship between Lender Yield and Prosper rating through loan amount. Lender Yield allocated separately by Prosper rating. The higher the risk, the smaller the amount of loan is allowed to issue. Borrowers graded A and B are those who acquire most of the maximum amount of loan. Default loans also concentrate on higher interest rate in return. The typically riskiest HR graders are only authorized to fill out loans under 5000 dollar.
While D, E and HR are the real high-risk and insecure types of loan, many investors seem to be interested in those categories. The plot area of loan amount which has less than 10,000 USD and less than 100 investors is quite crowded. It could be explained by the fact that riskier loans yield a higher amount of money in return. Furthermore, singular investment in which investors fund wholely for a loan spreads out for all amounts, and it seems to attract a large number of lenders in the market. Overall, investors seem not interested in sharing portfolios with too many people in high-risk investments.
The distribution of Lender Yield by Prosper Rating clearly changed through each year. Lender Yield is neatly distributed by each grade year over year, showing the advancement in controlling system.
Number of investors on each loan also associated with Prosper Rating and Lender Yield, less so concentrated portfolios are those having a high rate and dispersing for every number of DTI.
Yes, there were. As I found out, the higher the risk of a loan, the smaller the amount of loan allowed to issue. As we can see HR graders are only accepted for loans less than 5,000. On the other hand, larger loan requests relate to better rating loans.
The graph above summarises Prosper loan activity from 2005 to 2014 shortly. From 2006 to 2009, Prosper determined loan rates using auction system. Following the SEC registration in 2009, the company created a new model that determined by a formula evaluating each prospective borrower’s credit risk “Prosper Rating”. That’s why we saw the prosper rating only appears from 2009 in the graph. Besides, Prosper sales volume drop significantly in 2009 due to the SEC event. 2013 witnessed a remarkable growth in Prosper sales due to their own improvements and the substantially increasing awareness among borrowers as well as the investors.
One thing is necessary to mark down here, data of the year 2014 only reflects the first three months Jan, Feb and Mar; that’s why the total amount the plot showing is a bit uncanny low. Yet the quantity for only three months of 2014 has been nearly reaching the total sales of the year 2012, and almost half of the total amount of the year 2013, which is pointing out that Prosper keeps growing substantially year after year.
This plot shows the strong relationship between Prosper Rating and Lender Yield. Investors can take a look at this to understand how a portfolio works basically.
Debt to Income Ratio also indicates a health of a portfolio. Looking at this plot we can have an overview of the Prosper Loan market, what factors are there that investors mostly care about and how the other put money in their investment.
This project took much more time than I would expect in the first place. Approaching the dataset, I really had no idea of what information this data is delivering, every variable and every figure that show seem very strange to me since I have no experience working in financial industry. Realizing I need to gain some knowledge about finance and p2p lending platform first hand, I do a lot of research and read more about Prosper lending as well as Lending Club business, trying to comprehend the pattern behind the data, getting insight from this data and learn how to make a real investment as an investor.
The more I do research and the more I know how to calculate and predict the return on each loan issued by distinct variables, prosper data appears to make more sense to me. Each variable plays an important role as a whole picture. To really learn how to invest on a loan effectively, I believe the investors not only have to dive deep into the data provided, but also need to gain a lot more practical experience along the way.
To further investigate this dataset, I would like to learn more about the predictive model and try to make some predictions on the defaulted possibility of each loan through available observations.